A continuous time model for multiagent systems governed by reinforcementlearning with scale-free memory is developed. The agents are assumed to actindependently of one another in optimizing their choice of possible actions viatrial-and-error search. To gain awareness about the action value the agentsaccumulate in their memory the rewards obtained from taking a specific actionat each moment of time. The contribution of the rewards in the past to theagent current perception of action value is described by an integral operatorwith a power-law kernel. Finally a fractional differential equation governingthe system dynamics is obtained. The agents are considered to interact with oneanother implicitly via the reward of one agent depending on the choice of theother agents. The pairwise interaction model is adopted to describe thiseffect. As a specific example of systems with non-transitive interactions, atwo agent and three agent systems of the rock-paper-scissors type are analyzedin detail, including the stability analysis and numerical simulation.Scale-free memory is demonstrated to cause complex dynamics of the systems athand. In particular, it is shown that there can be simultaneously two modes ofthe system instability undergoing subcritical and supercritical bifurcation,with the latter one exhibiting anomalous oscillations with the amplitude andperiod growing with time. Besides, the instability onset via this supercriticalmode may be regarded as "altruism self-organization". For the three agentsystem the instability dynamics is found to be rather irregular and can becomposed of alternate fragments of oscillations different in their properties.
展开▼